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1  Project  overview 

The  broad  objective  of  this  grant  was  to  develop  a  generally  applicable  theory  of  per¬ 
formance  of  information-level  fusion  that 

•  provides  accurate  prediction  of  post-fusion  algorithm  accuracy  in  uncertain  en¬ 
vironments. 

•  determines  factors  affecting  fundamental  performance  tradeoffs,  e.g.,  sample 
size,  resolution,  specificity,  and  sensitivity  of  sensors. 

•  specifies  performance  benchmarks  allowing  quantitative  comparison  of  different 
fusion  algorithms. 

•  provides  guidelines  for  algorithm  design  and  optimization. 

The  effort  focused  on  information  theoretic  fusion  methods  and  our  analysis  was 
based  on  geometric  properties  of  information.  Our  research  has  impacted  application 
domains  where  information  theoretic  fusion  is  applied.  These  included  georegistration, 
remote  sensing,  multimodality  anomaly  detection,  visualization,  and  dimensionality 
reduction. 
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2  Technical  accomplishments 

Our  technical  accomplishments  are  reported  in  several  papers,  submitted  or  appeared, 
cited  in  the  citation  references  below.  These  papers  make  the  following  contributions: 

1.  We  have  obtained  the  (to-date)  sharpest  asymptotic  expressions  for  estimator 
bias,  variance  and  a  CLT  for  a  wide  class  of  information  divergence  estimators 
[6],  [7],  The  utility  of  these  expressions  is  that  they  can  be  used  to  optimize 
over  tuning  parameters  of  the  fusion  criterion,  thereby  circumventing  the  need 
for  manual  parameter  tuning.  This  theory  has  been  applied  to  non-parametric 
estimation  of  the  mutual  information  [8]  and  estimation  of  intrinsic  dimension 

[9]. 

2.  A  new  framework  was  introduced,  involving  boundary  compensation  and  weighted 
estimation,  to  significantly  enhance  information  divergence  estimator  perfor¬ 
mance  [10],  [11].  These  estimators  have  provably  better  performance  than  state- 
of-the-art  divergence  estimators  proposed  by  Leonenko,  Barishnokov  and  others. 

3.  A  new  framework  for  entropy  estimation  using  maximum  entropy  principles  was 
introduced  leading  to  an  upper  bound  on  the  true  fusion  criterion  (entropy  and 
relative  entropy)  [12].  When  the  boundary  is  known,  this  maximum  entropy 
method  is  competitive  with  the  non-parametric  methods  of  entropy  estimation 
discussed  above. 

4.  A  new  approach  was  proposed  for  estimating  parameters  of  non-parametric  topic 
models  [13,  3,  1],  Topic  models  are  useful  for  text  data  and  other  discrete  ’’soft 
information”  sources.  The  new  method  merges  soft  and  hard  information.  The 
confidence  constraint  approach  was  also  considered  in  the  general  context  of 
multiple  observation  setup  [2]. 

Each  of  these  accomplishments  is  briefly  described  in  the  following  paragraphs.  A 
summary  of  the  research  accomplishments  in  non-parametric  entropic  fusion  methods 
(first  two  bullets  above)  was  published  as  part  of  the  proceedings  of  the  201 1  Defense 
Applications  of  Signal  Processing  Workshop  held  in  Coolum  Australia  [15], 

2.1  Expressions  for  divergence  estimator  bias,  variance  and  a  CLT 

The  key  to  performance-driven  fusion  is  an  accurate  theory  of  performance  that  can 
be  used  to  identify  the  important  underlying  factors.  Our  focus  has  been  on  deriving 
asymptotic  bias  and  mean-squared  error  (MSE)  in  the  regime  of  large  sample  size  for 
estimators  of  information  divergences  between  feature  distributions.  Information  di¬ 
vergences  are  used  as  objective  functions  that  are  minimized  or  maximized  during  the 
process  of  image  registration,  blind  deconvolution,  source  separation,  model  selection 
and  other  algorithms  relevant  to  fusion.  For  a  broad  class  of  density-plug-in  estimators, 
that  includes  the  common  kernel  density  and  k  nearest  neighbor  (kNN)  plug-in  estima¬ 
tors,  we  have  developed  a  generally  applicable  theory  that  gives  analytical  closed-form 
expressions  for  asymptotic  bias  and  MSE  in  terms  of  the  sample  size,  the  dimension 
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of  the  feature  space,  and  the  underlying  feature  probability  distribution.  These  results 
appear  in  the  technical  report  [6]  co-authored  by  co-PI’s  Hero  and  Raich  and  the  sup¬ 
ported  University  of  Michigan  graduate  student  Kumar  Sricharan.  This  report  incorpo¬ 
rates  comparisons  to  state  of  the  art  divergence  and  entropy  estimation  algorithms  and 
is  an  extension  of  the  report  cited  in  last  year’s  progress  report.  A  paper  that  covers 
the  basic  convergence  results  was  submitted  to  the  IEEE  Transactions  on  Information 
Theory  and  is  currently  under  revision.  The  asymptotic  theory  developed  in  [6]  was 
applied  to  anomaly  detection  in  [16].  This  has  led  to  the  fastest  and  most  reliable 
anomaly  detection  method  to  date  and  can  be  applicable  to  large  datasets  with  millions 
of  samples.  The  asymptotic  theory  was  applied  to  intrinsic  dimension  estimation  in 
[9],  which  was  implemented  for  fusion  and  segmentation  of  hyperspectral  imagery  in 
[15],  The  theory  was  also  applied  to  high  dimension  correlation  screening  [17]. 

2.2  Weighted  and  boundary  compensated  divergence  estimators 

The  fusion  criteria  that  are  studied  in  this  project  are  all  derived  from  entropy  func¬ 
tions  of  the  underlying  probability  density  of  features  of  the  data.  These  features  are 
arbitrary  and  in  any  fusion  application  the  probability  densities  are  unknown.  To  be 
implemented  for  fusion  these  criteria  need  to  be  determined  accurately  from  data  and 
we  developed  estimators  and  confidence  intervals  in  [6]  based  on  k-nearest  neighbor 
density  estimates.  The  theory  developed  in  [6]  has  motivated  two  types  of  improve¬ 
ments  to  these  k-NN  estimators  that  translate  into  reduction  in  bias  and  significantly 
enhanced  performance.  The  first  improvement  is  boundary  compensation.  If  the  range 
of  amplitudes  of  the  data  are  bounded  then  there  can  be  severe  bias  in  the  k-NN  density 
estimator.  This  bias  does  not  decrease  as  the  number  of  samples  increases.  We  derived 
a  new  boundary  compensated  divergence  estimator  that  forces  the  bias  at  the  boundary 
to  decrease  without  affecting  the  bias  at  interior  points.  This  estimator  does  not  require 
knowledge  of  the  boundary  of  the  underlying  density.  The  boundary  compensated  k- 
NN  estimator  was  published  in  a  paper  [10],  co-authored  by  UM  GSRA  Sricharan  and 
co-PI’s  Hero  and  Raich,  in  the  Proceedings  of  the  IEEE  Workshop  on  Machine  Learn¬ 
ing  in  Signal  Processing,  (MLSP),  Aug  2010.  The  second  improvement  is  a  weighted 
version  of  the  k-NN  plug-in  entropy  estimator  of  divergence.  In  [6]  we  show  that  the 
bias  of  the  standard  k-NN  plug-in  estimator  appears  as  a  series  of  terms  that  decay  with 
rates  ( k/M)~^d ,  j  =  2,4,6,..  .,  where  M  is  the  total  number  of  samples  and  d  is  the 
dimension  of  the  feature  space.  We  define  a  new  estimator  by  forming  the  weighted 
average  of  k-NN  plug-in  entropy  estimators  implemented  with  different  values  of  k 
( k  =  1,  2, 3, . . .).  By  judicious  choice  of  the  weighting  coefficients  we  show  that  this 
new  estimator  achieves  a  d-independent  convergence  rate  of  much  faster  order  M-1/2. 
The  weighted  k-NN  entropy  estimator  was  published  in  a  paper  [11],  co-authored  by 
UM  GSRA  Sricharan  and  co-PI’s  Hero  and  Raich,  in  the  Proceedings  of  the  IEEE  201 1 
Workshop  on  Statistical  Signal  Processing  (SSP). 
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2.3  Feature  selection  for  entropy  estimation  using  the  maximum 
entropy  principle 

To  simultaneously  address  the  problem  of  feature  selection  and  entropy  estimation  we 
considered  a  parametric  approach  to  density  estimation.  The  logarithm  of  the  probabil¬ 
ity  density  function  is  estimated  as  an  to- term  approximation  over  a  large  dictionary. 
Using  a  greedy  approach  for  the  TO-term  approximation,  we  show  in  [12]  that  entropy 

can  be  estimated  with  accuracy  0(  We  considered  two  estimators.  The  first 

considers  brute-force  estimation  of  entropy  based  on  m  component  density  approx¬ 
imation.  Although  the  estimator  is  not  practical  (i.e.,  the  computational  complexity 
associated  with  it  is  prohibitive)  its  accuracy  is  analyze  and  used  as  a  baseline  to  com¬ 
pare  to.  The  second  estimator  uses  a  greedy  approach  for  the  m- term  approximation 
reducing  the  optimization  to  one  component  at  a  time  and  thus  enabling  a  significant 
reduction  in  computation  complexity  relative  to  the  first  algorithm.  For  each  algorithm, 
we  were  able  to  show  (under  specific  conditions)  that  the  entropy  estimation  error  is 

0(^T).  The  paper  [12]  is  co-authored  by  OSU  GRA  Behrouz  Behmardi  and  the 
Co-PI’s  Raich  and  Hero  and  was  published  in  the  IEEE  Proc.  of  2011  Inti  Conf.  on 
Acoustics,  Speech,  and  Signal  Processing. 

2.4  Dimension  estimation  in  topic  models 

In  the  past  few  years,  probabilistic  topic  models  have  been  developed  and  applied  to 
problems  in  text  document  classification  and  computer  vision.  Such  models  provide 
a  probabilistic  framework  for  characterizing  a  corpus  of  documents  (or  images)  in  the 
bag-of-words  representation.  These  results  are  directly  applicable  to  fusion  of  soft 
(textual  or  contextual)  information  and  hard  (sensor)  information.  A  key  feature  of 
such  models  is  that  a  low  dimensional  representation  is  facilitated  through  latent  topic 
variables.  Most  inference  algorithms  in  topic  models  assume  a  fixed  number  of  top¬ 
ics  and  determine  the  number  of  topics  empirically.  We  developed  a  new  approach 
for  identifying  the  number  of  topics  in  topic  models  through  rank  estimation.  In  [13] 
we  present  a  rank  minimization  framework  and  provided  sufficient  conditions,  which 
guarantee  exact  recovery  of  the  number  of  topics.  Moreover,  we  proposed  a  heuristic 
convex  relaxation  to  the  rank  minimization.  Using  simulations,  we  showed  that  the 
proposed  convex  relaxation  provides  exact  rank  recovery  under  the  sufficient  condi¬ 
tions  proposed  for  the  rank  minimization  problem.  The  core  principle  that  allows  for  a 
tuning-parameter  free  optimization  is  the  statistical  error  analysis.  A  similar  principle 
which  we  developed  here  was  afterwards  utilized  in  [2]  for  solving  the  multiple  sys¬ 
tem  of  equations  setup  in  which  solutions  share  a  similar  sparsity  pattern.  In  [3]  we 
presented  a  convex  optimization  frame  for  solving  the  problem  in  [13]  efficiently.  Our 
approach  allowed  us  to  consider  problems  of  real-world  dimensions  (e.g.,  thousands  of 
documents  consisting  of  thousands  of  vocabulary  words).  A  more  comprehensive  jour¬ 
nal  version  of  this  work  [1]  is  currently  under  review.  The  paper  [13]  is  co-authored  by 
OSU  GRA  Behrouz  Behmardi  and  was  published  in  the  IEEE  Proc.  of  the  201 1  Statis¬ 
tical  Signal  Processing  Workshop.  The  paper  [3]  is  co-authored  by  OSU  GRA  Behrouz 
Behmardi  and  was  published  in  the  IEEE  Proc.  of  the  201 1  Machine  Learning  for  Sig- 
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nal  Processing  Workshop.  The  paper  [1]  co-authored  by  OSU  GRA  Behrouz  Behmardi 
is  currently  under  review  for  the  IEEE  Trans,  on  Signal  Processing.  The  paper  [2]  is 
co-authored  by  OSU  GRA  Evgenia  Chunikhina  and  is  accepted  for  publication  in  the 
IEEE  Proc.  of  the  Inti.  Conf.  on  Acoustics  Speech  and  Signal  Processing,  2012. 

2.5  Multiclass-preserving  dimension  reduction 

Classical  methods  for  dimension  reduction  (e.g.,  PCA,  LDA)  are  often  motivated  by  a 
pre-specified  class  conditional  data  distribution.  In  our  recent  submission  [14],  we  ex¬ 
plore  a  framework  for  dimension  reduction  that  is  based  on  a  multi-class  generalization 
of  Chernoff  bound  applicable  both  to  parametric  and  non-parametric  models.  We  were 
able  to  show  through  numerical  analysis  of  classification  error  rates  across  multiple 
datasets  that  the  framework  yields  comparable  (and  sometimes  superior)  performance 
to  other  state-of-the-art  methods  in  dimension  reduction.  The  objective  we  explored 
can  be  regarded  as  an  estimator  of  functional  of  probability  density  functions  measur¬ 
ing  aggregating  pairwise  dissimilarities  of  such  densities.  As  this  objective  fits  our 
framework  of  estimators  of  functionals  of  densities,  we  suspect  that  further  develop¬ 
ment  of  the  theory  towards  a  MSE  analysis  of  m-estimates  (associated  with  functionals 
of  densities)  would  be  applicable.  Our  results  are  documented  in  our  IEEE  Trans,  on 
Pattern  Analysis  and  Machine  Intelligence  submission  (March  2010)  by  Raich  and  the 
graduate  student  Madan  Thangavelu  from  Oregon  State  University  [14], 

3  Personnel  supported 

The  project  supported  the  two  co-PI’s:  Alfred  Hero  at  Michigan  and  Raviv  Raich  at 
Oregon  State.  It  also  supported  several  graduate  students:  Kumar  Sricharan  at  Michi¬ 
gan  and  Evgenia  Chunikhina  and  Behrouz  Behmardi  at  Oregon  State. 

4  Technology  Assists  and  Transitions 

Our  efforts  have  been  focussed  on  developing  a  new  theoretical  framework  for  performance- 
driven  sensing  and  fusion.  The  theory  was  not  sufficiently  mature  to  be  transitioned 
through  technology  assists  or  transitions  during  the  short  (2.5  year)  duration  of  this 
grant. 
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